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UNIFORM IN BANDWIDTH CONSISTENCY OF KERNEL-TYPE 
^ : FUNCTION ESTIMATORS 

o 

CN . By Uwe Einmahl^ and David M. Mason^ 

Vrije Universiteit Brussel and University of Delaware 

We introduce a general method to prove uniform in bandwidth 
, consistency of kernel-type function estimators. Examples include the 

kernel density estimator, the Nadaraya-Watson regression estimator 
and the conditional empirical process. Our results may be useful to 
■ establish uniform consistency of data-driven bandwidth kernel-type 

^0 ' function estimators. 

, 1. Introduction and statements of main results. Let X,Xi,X2, ■ ■ ■ be 

i.i.d. W^, d>\, valued random variables and assume that the common dis- 
tribution function of these variables has a Lebesgue density function, which 
we shall denote by /. A kernel K will be any measurable function which 
satisfies the conditions 



X 



m ■ (K.i) /_ K{s)ds = l, 

o 

(K-ii) ||i^||oo := sup |K(x)| = «; < oo. 

\ The kernel density estimator of / based upon the sample Xi , Xn and 

' bandwidth < /i< 1 is 

G ■ n 

> : InA^) = inh)-' ^ K{{x - X,)/h"''), X E W". 



i=l 



;h ' Choosing a suitable bandwidth sequence hn ^0 and assuming that the den- 



sity / is continuous, one obtains a strongly consistent estimator /„, := fn,hn 
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of /, that is, one has with probabihty 1, fn{x) — > f{x), x G M . There are also 
results concerning uniform convergence and convergence rates. For proving 
such results one usually writes the difference fn{x) — f{x) as the sum of a 
probabilistic term fn{x) — E/„(x) and a deterministic term E/„(x) — f{x), 
the so-called bias. The order of the bias depends on smoothness properties of 
/ only, whereas the first (random) term can be studied via empirical process 
techniques, as has been pointed out by Stute [29, 30, 31] and Pollard [26], 
among other authors. 

A recent result by Gine and Guillou [14] (see also [5]) shows that if K is 
a "regular" kernel, the density function / is bounded and hn satisfies the 
regularity conditions \ 0, /in/^2n is bounded, 

log(l//i„)/loglogn — > oo and logn — > oo, 

one has with probability 1, 

(1.1) ll/n - E/nlloo = 0(Vlog(l//l„)M„ ). 

Moreover, this rate cannot be improved. Interestingly, one does not need 
continuity of / for this result. (Of course, continuity of / is crucial for 
controlling the bias.) 

Some related results on uniform convergence over compact subsets have 
been obtained by Einmahl and Mason (EM) [11] for a much larger class of 
estimators including kernel estimators for regression functions among others. 
In this general setting, however, it is often not possible to obtain the con- 
vergence uniformly over W^. Density estimators are in that sense somewhat 
exceptional. 

The main purpose of this paper is to introduce a method to establish 
consistency of kernel-type estimators when the bandwidth h is allowed to 
range in a small interval which may decrease in length with the sample size. 
Our results will be immediately applicable to proving uniform consistency 
of kernel-type estimators when the bandwidth /i is a function of the loca- 
tion X or the data Xi , . . . , Xn . The resulting "variable bandwidth kernel 
estimators" are from a statistical point of view clearly preferable to those 
bandwidths which are only a function of the sample size n, ignoring the data 
and the location. We discuss this in more detail in Remark 7 below, after 
we have stated some of our main results. Furthermore, we address the issue 
of bias in Remark 6. 

In order to formulate our results let us first specify what we mean by a 
"regular" kernel K. Consider the class of functions 

/C = {K{{x - ■)/h^/'^) ■.h>0,xe R'^}. 



For e > 0, let iV(e,/C) = supg A^(Ke,/C, dg), where the supremum is taken 
over all probability measures Q on {W^,B), dq is the L2((5)-metric and, as 
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usual, N{e,IC,dQ) is the minimal number of balls {g'-dq^g^g') < e} of dg- 
radius e needed to cover /C. Assume that /C satisfies the following uniform 
entropy condition: 

(K.iii) for some C > and i/ > 0, iV(e,/C) < Ce"*^, < e < 1. 

Pollard [26], Nolan and Pollard [25] and van der Vaart and Wellner [35] 
provide a number of sufficient conditions for (K.iii) to hold. For instance, 
it is satisfied for general d > 1 whenever K{x) = (j){p{x)), with p{x) being 
a polynomial in d variables and (p being a real-valued function of bounded 
variation. 

Finally, to avoid using outer probability measures in all of our statements, 
we impose the following measurability assumption. 

(K.iv) /C is a pointwise measurable class, that is, there exists a countable 
subclass /Co of ^ such that we can find for any function g £ IC a 
sequence of functions {gm} in K,o for which 

g^{z)^g{z), zeR^. 

This condition is discussed in [35]. It is satisfied whenever K is right 
continuous. 

Our first result concerning density estimators is the following. 

Theorem 1. Assuming (K.i)-(K.iv) and f is bounded, we have for any 
c> 0, with probability 1, 



/ION 1- Vnh\\fn,h-^fn,h\\oo 7^/ ^ / 

[1.2) limsup sup =: A (cj < oo. 

n^co c\ogn/n<h<l ^/log{l/h) V log log n 

Remark 1. Though this was not our main goal, we point out that 
if one chooses a deterministic sequence hn satisfying nhn/ logn ^ oo and 
log(l//i„,)/loglogn — > oo, one re-obtains (1.1), which is Theorem 1 of Gine 
and Guillou [14] with slightly less regularity. (We do not need to assume, as 
they do, that /i„ \ or that /in/^2n is bounded.) 

Remark 2. With applications to variable bandwidth estimators in mind, 
we further note that Theorem 1 implies for any sequences < a„ < 6„ < 1, 
satisfying 6„ ^ and na„/ log n oo, with probability 1, 



/log(l/a„) Vloglogn 



(1.3) sup \\fn,h-Kfn,h\\oo=0\ 

which in turn implies 



(1-4) lim sup \\fn,h-'^fn,h\\oo = a.s. 
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Remark 3. It is routine to modify the proof of Theorem 1 to show that 
it remains true when (K.iii) is replaced by the bracketing condition: 

(K'.iii) for some Co > and z^o > 0, iV[.](e,:r,L2(P)) < Coe~''<', < e < 1. 

Refer to page 270 of [34] for the definition of N^.^{e , , L2{P)) ■ Essentially 
all that one has to do is to substitute the use of Corollary 4 by Lemma 19.34 
of van der Vaart [34]. 

For a related result refer to Theorem 1 of Nolan and Marron [24] , where 
almost sure convergence to zero has been established in a similar setting. 
On the other hand, our result provides explicit convergence rates for kernel 
density estimators. 

Let us now look at the bias term. As soon as we know that 

(1.5) sup llE/„,,;,-/lloo^O, 

a„<h<b„ 

we have under the conditions of Theorem 1, 

sup ll/n,ft-/||oo ^0. 

If / is uniformly continuous on W^, here is a sufficient condition for (1.5) 
which is easy to verify: Define 

^k{x)= sup \K{y)\, xEM^ 

and introduce the assumption 

(K.v) / '^K{x)dx <oo. 

Note that this assumption trivially holds for a compactly supported kernel 
function. 

Corollary 1. Assuming (K.i)-(K.v) for any sequences < an <bn < 
1, satisfying 6„ ^ and ?ia„/logn — > oo, and any uniformly continuous den- 
sity f, we have 

(1.6) lim sup ll/n,h-/||oo = a.s. 

Remark 4. If a„ = clogn/n for some c > 0, then (1.6) does not hold, 
that is, the limit in (1.6) is positive. Refer to [4] and [6] for details. 

Our method is not restricted to the case of kernel density estimators. 
To give the reader an indication of what other kinds of kernel-type estima- 
tors can be treated using our techniques, consider i.i.d. {d + l)-dimensional 



KERNEL-TYPE ESTIMATORS 



5 



random vectors {Y,X), {Yi,Xi), {Y2,X2), ■ ■ ■ , where the y-variables are 
one-dimensional. We shall assume that X has a marginal Lebesgue density 
function / and that the regression function 

m{x)=E[Y\X = x], xGM^, 

exists. Let fhn,h{x) be the usual Nadaraya-Watson estimator of m{x) with 
bandwidth < < 1, that is, 

m.,,ixj- Y.UK{{x-X,)/hV'^) ■ 

A huge literature has been developed on the consistency of the Nadaraya- 
Watson estimator. Consult [16] and [11] for references to some of the more 
important work. 

Assuming that m is 1 times differentiable at a fixed xq, one can use 
the local polynomial regression techniques of Fan and Gijbels [12] to obtain 
a better estimate at xq than that given by the Nadaraya-Watson estimator. 
We will not treat the uniform consistency of such estimators in the present 
paper. It should, however, be feasible to apply similar empirical process 
methods in this setting as well. 

With the above setup we have the following uniform in bandwidth result. 
Set 

f{x,h)=E[YK{{x- X)/h^''^)]/h and f{x,h) =E[K{{x - X) /h^/'^)]/h. 

For any subset / of M*^, let P denote its closed e-neighborhood with respect 
to the maximum-norm | • |_|_ on M"^, that is, = maxi<j<(i |xi|, x E W^. Set 
further for any function : R'^ ^ M*^, — supj.gj \ip[x ) I . 

Theorem 2. Let I he a compact subset ofW^ and let K satisfy (K.i)- 
(K.iv) with support contained in [—1/2,1/2]'^. Suppose further that there 
exists an e > so that 

(1.7) f is continuous and strictly positive on J := P . 
If there exists an M > such that 

(1.8) \Y\l{XeJ}<M a.s., 

we have for large enough c > and any bn \ 0, 

1- VnJi\\fhn,h-r{-,h)/f{-,h)\\i 
lim sup sup , = 

(1.9) ri^oo clogn/n<h<b„ Vlog(l//i) V log log ?! 

=: K{I, c) < oo a.s. 
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Moreover, if instead of (1.8) we assume that for some p> 2 
(1.10) supE(|y|P|X = z)=:a<oo, 

we have for any c > and bn\0 with 7 = 7(p) = 1 — 2/p, 

1. Vrih\\mn,h-r{-,h)/f{-,h)\\i 
lim sup sup = 

n^co c{logn/nyi<h<bn Vlog( 1/^) V log log n 

(1.11) 

=: K'{I, c) < oo a.s. 

Corollary 2. Let I be a compact subset of M'^ and let K satisfy 
(K.i)-(K.iv) with support contained in [—1/2,1/2]'^. Assume that the dis- 
tribution function of iY^X) has a Lebesgue density {y,x) -^p{y,x), so that 
the marginal density of X is given by 

/oo 
p{y,x)dy, xeR'^. 
-oo 

Suppose further that there exists an e > so that (1.7) holds and that 

(1.12) for all z £ J, lim p(y, z') = p{y, z) for almost every y € M. 

If (1.8) holds, then for < a„ < 6„ < 1, satisfying bn^ and na„/logn — > 
oo, 

(1.13) lim sup Wrhn h — = 

^^°°an<h<b„ 

If (1.10) holds, then with 7 = 1 — 2/p for < c(logn/n)''' < < 1 satisfying 

bn^O, 

(1.14) lim sup \\'rhn,h — = o,-S- 

c{logn/n)i<h<b„ 

Remark 5. Let us also mention that if, in the bounded case, we choose 
a deterministic bandwidth sequence /i„ satisfying the standard assumption 
nhn/logn oo and log(l//i„)/loglog?i oo, we get that with probabil- 
ity 1, 



Vnh^\\mn,h„ -r{-,hn)/fi-,hn)\\i , ^ ^ 
limsup - — < C < oo. 

n^cx. A/21og(l/ft,„) 

This is a sharp result. In our previous paper [11] we have shown under ad- 
ditional assumptions {hn \ and nhn d=l, I =[a, b] and K satisfies a 
continuity condition and is of bounded variation on M) that the limsup is 
positive and actually a limit. [Note, however, that the limiting constant has 
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not been correctly stated in formula (1-16) of that paper. With the nota- 
tion of the present paper the limiting constant is sup^-gj {a{x)\\K\\2) / \J f {x) , 
where (T^(x) = Var(y|X = x).] Moreover, if (1.8) holds, then a result of Col- 
lomb [3] implies that the condition na„/ log n ^ oo is necessary for uniform 
consistency. 

Remark 6. Under additional smoothness assumptions on / one can 
also derive explicit convergence rates in (1.6) and (1.14). For instance, if one 
knows that / is uniformly Lipschitz continuous, one easily sees that the bias 

(1.5) is of order 0{bl/'^), which permits one to derive a convergence rate in 

(1.6) one which depends on an, via the rate from Theorem 1, and on bn, via 
the rate in (1.5). For more information on the interplay between smooth- 
ness and the size of the bias term consult [1, 8, 10]. Similarly under extra 
smoothness conditions the bias term in the Nadaraya- Watson estimator is 
well behaved and one also can specify convergence rates. For appropriate 
smoothness conditions refer to [1] and, especially, to Section 2.3 of [7]. 

Remark 7. Suppose now that /i„ = hn{x) is a local data-driven band- 
width sequence satisfying 

(1.15) P{a„</i„(x)<6„:xG/}^l, 

or a constant data-driven bandwidth sequence hn satisfying with probability 
1 , for all large enough n > 1 , 

(1.16) an < K < bn- 

For instance, if d = I, one often has for appropriate 0<a<6<oo, an = 
an~^/^ and bn = bn~^^^. [10] is a good place to read about the various opti- 
mality criteria that lead to the n~^^^. In this case and more generally under 
the assumptions of Corollary 1, 

and under those of Corollary 2, 

where the convergence is either in probability or with probability 1 depend- 
ing on whether (1.15) or (1.16) holds. 

Deheuvels and Mason [7] consider local plug-in type estimators hn{x) 
which satisfy (1.15) with a.„ = ci/i„ and 6„ = C2hn, where ci < C2, or 

(1.17) f( sup\hn{x)/hn-C{x)\>e) ^0 

\ x&I / 
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for any e > 0, where C is an appropriate continuous function on /. Refer 
especially to their Example 2.1, where they show subject to smoothness 
assumptions that the optimal hn{x) in terms of asymptotic mean square 
error for estimating / or Tlx satisfies (1.17) with — • 

The literature on data-driven bandwidth selection is extensive. We cite, 
for instance, [[2, 17, 21, 22, 23, 27]]. For further references and methods 
consult [18], Chapter 7 of [10], [7] and [9]. 

All data-driven bandwidth selection procedures require some smoothness 
assumptions in order to get rates. Our results show that even if such as- 
sumptions do not hold, one may still have consistency as long as (1.15) is 
satisfied for appropriate a„ and 6„ not necessarily of the form a„ = ci/i„ and 

Our next example is a kernel estimator of the conditional distribution 
function 

F{t\z) :=P(y <i|X = z), 
defined for a kernel K and bandwidth < /i < 1 to be 



(1.18) FnM^) 



Y.umz-x,)/hV'^) 



Stute [32] calls this the conditional empirical distribution function and was 
the first to establish uniform consistency results for it. 

Theorem 3. Let I he a compact subset ofW^ and let K satisfy (K.i)- 
(K.iv) with support contained in [—1/2,1/2]'^. Suppose further that there 
exists an e > so that (1.7) holds. Then, with probability 1, we have for 
large enough c > and any 6„ \ 0, 



hmsup sup 



(1.19) clogn/n<h<b„ ^log(l//l) V log log n 

= :K"{I,c) <oo, 

where Fn,h{t\z) = E[Ki{z - X)/h^/'')l{Y < f}]/(/iE/„,;,(z)), t G R. 

Corollary 3. Let I be a compact subset of R'^ and let K satisfy 
(K.i)-(K.iv) with support contained in [—1/2, 1/2]^^. Suppose further that 
there exists an e > so that (1.7) holds and (1-12) is satisfied. Then for 
< a„ < 6„ < 1, satisfying 6^ — > and non/ logn oo, 

(1.20) lim sup snp\\FnA-\^)-F{-\z)\\^=0. 

an<h<bn zel 
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Remark 8. Sometimes one wants to use vector bandwidths (see, in 
particular, Chapter 12 of Devroye and Lugosi [9]). With obvious changes 
of notation, our results and their proofs remain true when hn is replaced 
by a vector bandwidth h„ = (/in ^ , . . . , hli^ ) , where mini< j<rf hn^ > 0. In this 
situation we set /i„ = nf=i ) ^^r any vector v = {vi, . . . , v^) we replace 
'v/h}/'^ by {vi/hn \ ■ ■ ■ ,Vd/h!^^). For ease of presentation we chose to use 
real- valued bandwidths throughout. 



Theorem 1 is proved in Section 2. Theorems 2 and 3 will follow from a 
more general result stated and proved in Section 3. Our proofs are based 
on an extension of the methods developed in [11]. We use the same idea 
which was developed in [11], namely, combining an exponential inequality 
of Talagrand [33] with a suitable moment inequality. 

2. Proofs of Theorem 1 and Corollary 1. We shall look at a slightly 
more general setup than in the Introduction. Let {X,A) be a measurable 
space. Throughout this section we assume that on our basic probability 
space {VL,!F,W) we have independent (J^, ^)-measurable variables : — > 

1 ^^i^n, with common distribution fi. 

Let ^ be a pointwise measurable class of functions from to M (see the 
Introduction and Example 2.3.4 in [35]). Further let ei, . . . ,e„ be a sequence 
of independent Rademacher random variables, independent of Xi, . . . ,Xn. 
Let G be a finite- valued measurable function satisfying for all x € X, 

(2.1) G(x)>sup|5(x)|, 
and define 

(2.2) N{e, Q) = sup7V(e Vg(G2), g, dq), 

Q 

where the supremum is taken over all probability measures Q on [X ^A) for 
which < Q{G'^) < oo and dq is the L2((5)-metric. 

We need the following version of Proposition A.l of EM [11]. 



Proposition 1. Let Q he a pointwise measurable class of hounded func- 
tions such that for some constants C,u >1 and < a < P and G as above, 
the following conditions hold: 

(i) E[G(X)2]</32; 

(ii) N{e,g)<Ce-'',0<e<l; 

(iii) al.= supg^gE[g{Xf]<a^; 

(iv) supggg \\g\\oo < j^Vna^/log{Ci(3/a), where Ci = C^/^ V e. 
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Then we have for some absolute constant A, 

n 

(2.3) E Y.^^giX,) 



1=1 



<AVuna^log{Cip/a). 



Proof. Our proof is a modification of that of Proposition A.l of EM 
[11]. We denote vectors (xi, . . . , x„) G by x and we define the subsets F„ 
and Gn of A"" as in this paper, that is, 

Gn := |x:n-i^^G2(xj) < 256/32|, 
Fn := |x:n"^supV52(2;j) <64o-2l. 

I 9&Sj=i J 

We can infer from (A.8)-(A.10) in [11] that on F„nGn, 

n 



E 



i=l 



< K'aVnulog{GiP/a), 



where K' is an absolute constant. Therefore, we have for 



(2.4) 



t > 96K'aVni^log{Ci(3/a), 



i=l 



>t^<l/96 VxGF„nG^ 



and, consequently, in this range of t that 



i=l 



>t <l/96 + /x-(F„^) + /."(G^). 



By Markov's inequality we trivially have /i"(G5^) < 1/256. Using Lemma 5.1 
of [15] exactly as in [11] and recalling that Cifi/a > e and > 1, we see that 

/i'^(F^) < 4/i"(G^) + 12 • WiGip/a)-^^" < 7/256, 



which finally implies that 



i=l 



>t} <l/24, 



whenever (2.4) holds. A straightforward application of the Hoffmann~j0rgensen 
inequality as stated in Proposition 6.8 of Ledoux and Talagrand [20] finally 
yields the desired moment inequality. □ 



From the above moment inequality we can infer the following: 
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Corollary 4. Let Q he as in Proposition 1 satisfying (i)-(iii), and 
instead of (iv) assume that 

(v) supggg ||g||oo < U, where ao<U< C2\/n(3, and C2 = 
Then we have 



< A{Vuna^o^ogiCiP/ao) + 2i^UlogiC3n{P/U)^)}, 



(2.5) E Y.^^9iX^) 
1=1 

where C3 = Cf/lGi'. 
Proof. Whenever 

U<^Vna^Jlog{Cip/ao), 
inequality (2.5) follows immediately from om' proposition by choosing a 

(To. 

Assume now that 

-l-Vnai/log{Cip/ao) <U< C2V^P. 
4y v 



Then using the monotonicity of the function t — > V nt^ / \og{Ci[3 /t) we can 
find a unique cr gJcjo,/?] satisfying 

C/=^VnaVlog(Ci/3/a). 

Applying our proposition with this choice of a, it follows that 

n 



E 



i=l 



< AVi^na^log{Ci(3/a) < 4Ai/C/log(Ci/3/cj). 



Next rewriting the equation which defines a and recalling that CiP/a > e, 
we readily obtain that 



1/a < a^Wlog{Ci(3/a) = ^/{A^U), 

and thus 

Ci{f3/a) < Ci/{4V^){V^(3/U) =: v^(V^/?/C/). 
It follows that 



E 



1=1 



<2Ai^Ulog{C3niP/Uf), 



which proves the corollary. □ 
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A bound similar to that given in Corollary 4 has been given by Cine and 
Guillou [13] using a different method. 

As already indicated in the Introduction, our proof is based on an inequal- 
ity of Talagrand [33] (see also [19]) which we state here for easy reference 
later on. 

Let On be the empirical process based on the sample Xi, . . . ,Xn, that is, 
if g:X ^M, we have 

n 

an{g) = Y.{g{X.{)-¥.g{X))/^, 
1=1 

and set for any class Q of such functions 

\\y/nan\\c = s'^p\Vnan{g)\. 

Inequality. Let Q he a pointwise measurable class of functions satis- 
fying for some < M < oo, 

lblloo<M, g£g. 

Then we have for all t> 0, 



max ||\/mam||c — ^1 

l<m<n \ 



i=l 



+ t 

g ^ 



A2t^\ , { M 
M 



where ag = supg^gYar{g{X)) and ^1,^2 cire universal constants. 
Proof of Theorem 1. We first note that 

= h [ K\u)f{x-uh^/'')du<h\\f\\oo\\K\\l 



where as usual ||Er||2 = {J^d K'^{s) ds)^/"^. 

Set for j, k>0 and c> 0, = 2'^, hj^ = (2-' c log nfc)/nfc and 

/C,-fc = {K{{x - ■)/h^'''):hj^k <h< /ij+i,fc,x G W"}. 
Clearly for /ij ^ <h< fc, with k as in (K.ii), 

E{K^i{x - X)/h^''')) <^A 2hj4f\U\K\\l =: ^ A Doh^^k =: ^fc- 
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We now use Corollary 4 to bound 



E 



i=l 



K. 



To that end we note that each fCj^k satisfies (i) with G = (3 = k. Further, 
since ICj^k C /C, we see by (K.iii) that each ICj^^ also fulfills (ii). [W.l.o.g. we 
assume that z^, C> 1 in (K.iii).] Noting that 

Ci/3/ao < (/3V^o) V Cl 

and the function h — > h\og{h~^ V C^) is increasing for /i > (recall that 
Ci> e), we see by applying Corollary 4 with U = P = k and using the 
bound do < (Jj^k, that we have for j > 0, 



E 



■n-k 



1=1 



I uukDoh 



K. 



(3^ 



log 



/32 



\JCl]+2AuK\og{C^nk), 



which for Di = A^J uDq and D2 = Dq/ is equal to 
(2.6) 



VCf +2AuKlog{C3nk). 



Using once more the fact that h hlog{h~^ V Cf) is increasing for h>0, 
we see that the first term of the above bound is, for large k, greater than or 
equal to 

log Uk Vlog (nfc /{ cD2 log rifc }) . 

Thus the order of the second term is always smaller than or equal to that 
of the first one. Consequently, we have for j > and large enough k, 



E 



1=1 



< DsJukhjul log 



^i,k 



1 



V loglognfc 



where D3 is a positive constant. 

Applying the Inequality with M = k and Og = a^, ^ < D^hj^k-, we get for 
any t > 0, 



(2.7) 



max 

nj._i<n<nj. 



\fnan\\^^^^ > Ai{Dsaj^k + i)| 

< 2[exp{-A2t'^/{Donkhj^k))+ex.p{-^2t/K)]. 
Setting for any p > 1, j > and A; > 1, 

Pj,k{p)=^\ max WVnanWic. > Ai{D3 + p)aj^k>, 

[_nk-i<n<nk ^•'^ ) 
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,2 



and using the fact that ajj^/rikhjk > log log n^, we can infer that for large 
k, 



exp 



Do 



loglognfc +exp 



nfc/ij_feloglognfc 



Recalling that hj^k > c log rifc/n/j, we readily obtain that for large k and any 

i > 0, 



(2i 



Pj,k{p) < 4(lognfc) 



where p= . 

Set Ik = maxjj : hj^^ <^ 2}. It is easy to see that for large k, 

(2.9) lk<2\ognk. 

Hence in view of (2.8) and (2.9) we have for large k and p > 1, 

Pkip) := 5^p,-,fc(p)<8(lognfc)i"^ 
j=0 

which implies that if we choose p > 2{Do/A2)^^'^ (say), we have 

oo 

(2.10) 5^Pfe(p)<oo. 

k=l 

Notice that by definition of for large k, 

2hif,^k = hi^+i,k > 2. 
Consequently, we then have for Uk-i <n<nk, 



clogn 



n 



,1 



C 



clognfc 
nk 



' ^lk,k 



Thus for all large enough k and ^n<nk, 

Vnh\\fn,h-¥.fn,h\ 



Ak{p) 



max 



sup 



nk-i<n<nk clogn/n<h<l \/log{l/h) V log log 71 



>2Ai{Ds + p) 



C \\ \ max \\^/nan\\|c. > Ai{D3 + p)aj^k 

and we see that ¥{Ak{p)) < Pk{p)- Recalling (2.10), we obtain our theorem 
via the Borel-Cantelli lemma. □ 



KERNEL-TYPE ESTIMATORS 



15 



Remark 9. We note that if the density is bounded only over J : = for 
some e > 0, with / a compact subset of W^, and if is a kernel with support 
in [-1/2, 1/2]'' we still have for any < /iq < (2e)'^, with probability 1, 



Vnh\\fn,h-'^fn,h\\l i>tT \^ 

(2.11) limsup sup = =: K[I,c) < oo. 

n^cx. c\ogn/n<h<ho Vlog(l/^) V log log n 

This follows immediately from the above proof by an obvious modification 
of the bound for EK'^{{x — X)/h^^'^) and replacing the set W'' by / in the 
definition of KLj^k- 



Proof of Corollary 1. The proof of (1.3) is obvious from (1.2). 
Turning to the proof of (1.6), we note that by integrability the assumption 
that / is uniformly continuous on is equivalent to / being continuous on 
and satisfying the condition that 

lim sup{/(z):|z| > i?} = 0, 

which of course implies that ||/||oo < oo. This, when combined with the 
corollary on page 65 of [28], gives the following lemma. 



Lemma 1. Let f he a uniformly continuous Lebesgue density function 
on Mf^. Then for any kernel K which satisfies (K.i), (K.ii) and (K.v), we 
have 

(2.12) sup|/*K/,(z)-/(z)|^0 as/i\0, 

where f * Kh{z) := /i"^ /^d f{x)K{h-^l'^{z - x)) dx. 

Observing that E,fn,h{z) = f * Kh{z), we see that Lemma 1 and (1.5) 
imply (1.6). □ 



3. Proofs of Theorems 2 and 3 and Corollaries 2 and 3. We are now 

ready to prove Theorems 2 and 3 and Corollaries 2 and 3. We shall consider 
a slightly more general setting than in the Introduction, allowing the vari- 
ables Y,Yi,Y2, . . . to be r-dimensional, where r > 1. Further introduce the 
following process: 

Let ^> denote a class of measurable functions on with a finite-valued 
measurable envelope function F, that is. 



(3.1) 



F{y)> sup \ip{y)\, y G 
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Further assume that ^ satisfies (K.iii) and (K.iv) with /C replaced by For 
any G $ and continuous functions c,^ and dip on a compact subset of J of 
W^, set for x£ J, 

" fx-Xi 
1=1 



where X is a kernel with support contained in [—1/2, 1/2]"^ such that 
sup \K{x)\ =: K < oo and / K{s)ds = l. 



For future use introduce two classes of continuous functions on a compact 
subset J of M*^ indexed by 

C := {cp : if £ ^} and V := {d^^ : if £ 

We shall always assume that the classes C and D are relatively compact 
with respect to the sup-norm topology, which by the Arzela-Ascoli theo- 
rem is equivalent to these classes being uniformly bounded and uniformly 
equicontinuous. 

Theorem 4. Let I be a compact subset of M'^. Assume that ^ and JC 
satisfy the above conditions and the classes of continuous functions C and 
T> are as above, that is, relatively compact with respect to the sup-norm 
topology, where J = I^, for some <r] <1. Also assume that 

(3.2) / is continuous and strictly positive on J. 
Further assume that the envelope function F of the class $ satisfies 

(3.3) 3M>0, F{Y)1{X£J}<M a.s., 
or for some p> 2, 

(3.4) a:=snpE{FP{Y)\X = z) <oo. 

Then we have for any c > and < /iq < (2r/)°', with probability 1, 
(3.5j limsup sup — =: F{c) < oo, 

n^oo c{logn/n)i<h<ho \/nh(log{l/ k) V loglogU) 

where 7 = 1 in the bounded case and 7 = 1 — 2/p under assumption (3.4). 



Before proving Theorem 4 we shall show how it implies Theorems 2 and 
3 and Corollaries 2 and 3. We need the following lemma. 
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Lemma 2. Let 7i be a class of uniformly equicontinuous functions (7 : J ^ IE 
and let K ■.M.'^ ^ ^ be a kernel with support in [—1/2, 1/2]*^ so that J^d K{u) du - 
1. Then we have for any sequence of positive constants bn — > 0, 

sup sup \\g* Kh- g\\i ^0. 
gen 0<h<bn 

Proof. A simple transformation shows that if a; G /, 

\g{x)-g*Kh{x)\= f {g{x) - g{x - uh^/''))K{u) du , 
which for /i < 6„ and ah large enough n is obviously bounded above by 
SMY>{\g{x)-g{y)\:x,yeJ,\x-y\<b]/''/2] [ \Kiu)\du. 



Since the function class 7i is uniformly equicontinuous, we readily obtain 
the assertion of the lemma. □ 

Proof of Theorem 2. Set 
1 " 

fn,h{^) = ^Y.^iK{{x-x,)/h'/^), xei. 

Then we obviously have 

\rhn,h{x) - f{x, h)/f{x, h)\ 

(3.6) 1 \r(x h)\ 

\fnh{x)-r{x,h)\+ . ' ^ 'J' \fnhix)- fix,h)\. 

"l/n,.(x)|' "'"^ \f^,,{^)fix,h)\ ' 

From Theorem 4 [setting r = 1, <J> = {v^i}, where ^pi{y) = y,y G M] it now 
follows that with probability 1, 



fo^\ r Vnh\\rn,h{-) -r{-,h)\\i 

(3.7) hmsup sup — — < 00 

n~*oo {clogn/n)-'<h<bn Vlog(l//l) V log log 71 

and by (2.11) that 

(3.8) hmsup sup — — <oo. 

n^oo clogn/n<h<bn Vlog(l//l) V log log n 

This last bound of course implies that as n — > oo, 

sup \\fnM-)-fi-^h)\\i = 0{l) a.s., 

clogn/n<h<b„ 

where we can make the constant in the 0(l)-term arbitrarily small by choos- 
ing c large enough. Combining this observation with the subsequent result 
following from Lemma 2 that 

(3.9) sup \\f{.,h)-f{.)\\j^O 

0<h<b„ 
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and the assumption that the density / is positive on J, we can conclude 
that for c large enough fn^h is bounded away from on /, uniformly in 
clogn/n < h <bn- 

Combining this with (1.8) or (1.10) it follows that supo<;/j<b„ ||r(-, h)/ f{-,h)\\i 
remains bounded. Therefore, we can infer Theorem 2 from (3.6)-(3.8). □ 

Proof of Corollary 2. We first note that assumption (1.12) in con- 
junction with Scheffe's lemma and also condition (1.10) in the unbounded 
case implies that r{x) = E(y|X = x)f{x) is continuous on J. Applying 
Lemma 2 with TC = {r}, we see that 

(3.10) sup \\f{-,h)-r{-)\\i^O, 

0<h<b„ 

which with (3.9) and Theorem 2 completes the proof of the corollary. □ 

Proofs of Theorem 3 and Corollary 3. To see how Theorem 3 
follows from Theorem 4, set $ = {(ft}, where (fitiv) = l{y ^ i}, t,y 

C = {l//(-)} and V = {-F{t\-)/fi-):tGR}. 

The classes <I> and C clearly satisfy the assumptions of Theorem 4. To see 
that the function class P is a relatively compact class of continuous functions 
on J refer to pages 6 and 7 of [11], which also implies that the class 

n = {gt:teR}, 

where for each t G M, gt{-) = F(t| •)/(•), is also a relatively compact class of 
functions defined on J. Therefore Theorem 3 and Corollary 3 follow in the 
same way that Theorem 2 and Corollary 2 did from Theorem 4 and Lemma 
2. □ 



Proof of Theorem 4. We first note that 

sup^g^ Wd^ix) Y:-^^{K{^) - EKi^)}\\i 
lim sup sup — 

n^oo clogn/n<h<ho Vn/l(log(l//l) V log log n) 

^ u, 11 r \\J:?=i{K{^)-EK{^)}\\j 
< sup ||a^||/ hmsup sup . 

<^e$ n^oo c\ogn/n<h<ho V^/l(log(l//l) V log log n) 

In view of (2.11) it is obvious that this quantity is finite with probability 1. 
Therefore, if we set for ip £ ^, x £ I and h> 0, 

(3-11) ??y,n.fe(a;) =Cy(3;)^V9(yj)JC[^ h^/d j' 

it clearly suffices to show: 
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Proposition 2. Under the assumptions of Theorem 4, for all c > 0, 
there exists a Q{c) > such that with probability 1, 

1- sup (,^\\l]^^n,h-KV.fi,n,h\\l rM\ 

(3.12) limsup sup — =: Q\c). 

"-*oo c{\ogn/n)''<h<ha V'^/l(log(l//l) V log log n) 

Proof. We shall prove Proposition 2 under assumption (3.4), as it fol- 
lows in the bounded case directly from the proof of Theorem 1 and Remark 
9. Just replace the classes K,j,k by the classes 

Qj.k = {{y, z) ip{y)c^{z)K{{x - z)/h^/'^) :(fe^,xel, hj^k <h< 

Observe that these classes also satisfy conditions (i)-(iii) of Proposition 1 
with G = (3 = Ksupipgij, ||C(^||/ and our proof of Theorem 1 still works after 
some minor modifications. 

We turn to the unbounded case, that is, assume (3.4) for some p > 2. 
Recall that 7 = 1 — 2/p. For any k = 1, 2, . . . and (/? G set = 2^,ak = 
c(lognfc/nfc)'^ and 

(3.13) My) = ^{y)HF{y) < {nk/k)'/^- 
For nfc_i <n< n^, x £ I, Ok < h < Hq and f £ ^, let 

(3.14) =c^(^)E¥'fc(^.)i^(^)- 

The proof of Proposition 2 in the unbounded case will be a consequence of 
two lemmas. We will first show: 

Lemma 3. There exists a constant Qi{c) < oo, such that with probability 

1, 

(3.15) limsupAfc = (5i(c), 
where 

II W w W II 

Afc = max sup — 

"fc-i<"<'^feafc</i</io ^/nh{\og{l / h) V log log ?i) 

Proof. For x £ I , < h < Hq and let 

(X — u 

and 

(3.17) v%^,{u,v)=c^{x)Mv)K{^^y 
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Notice that 

(3.18) C/.(-)-<i,/. = ^'^'««(4AJ' 

where is the empirical process based on {Xi,Yi), . . . , 1^). For k>l 
let 

Sk{h) ■.= {vl%:ipe^andxel}. 

We see that 

ll«nlbfeW 



Afc = max sup 



"fe-i<n<nfeafe<h<ho \/nh{\og{l/h) V log log n) 
Note that for each v^^j^ ^ € Qkih), 

(3-19) l|^^S,Jloo < ll^^lloo sup \K\Unk/k)'/P =: DoKA)^/^. 



Also observe that 



Using a conditioning argument, we infer that this last term is uniformly over 
x€l 



<\Kl\W"l. Mt)K^i-f)dt 



<h\\cJ\W^P f fx{x-h^/'^u)K\u)du 

J [-1/2,1/2]'' 

<ha^/P sup \\cJ^j\\fx\\j\\K\\l=:hDi. 



Thus 



(3.20) sup Ev'^{X,Y)<hDi. 

Set for j, k>0, hj j, = 2^ak and 

'3j,k ■= {'V^^l^x xel and hj^k <h< hj+i^k}- 

Clearly by (3.20), for all x G /, ip £ ^ and hj k <h< hj^i^k, 
niy^LfiX,Y)]<hD^<2D^h,^J„ 
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which gives 
(3.21) 



sup Eu2(X,y) < 2Dihj^k. 



We shall use Corollary 4 to bound '^\\J2'i=i^i''^i-^i^^i)\\Gj k- Note first that 
by arguing as in the proof of Lemma 5 of EM [11], each Qj^k C G, where Q is 
a class that satisfies (K.iii) and (K.iv) with /C replaced by Q. Next by setting 
U = Do{nk/ky/P and /J^ = a^/p it follows that 



E 



^£iv{Xi,Yi] 



i=l 



Gi,k 



< AVuHkao log{Ci (3/ ao) 



Replacing ctq in the first term by the upper bound 2Dihj k A and recalling 
that 

hj^k >ak = c{\ognk/nkf~'^^^, 

we obtain after a small calculation that for suitable positive constants D2 
and L'a, 



(3.22) E 
Set 



^eiv{Xi,Yi] 



i=l 



< D-Wnkhj^k\og{{D2hj^k)~^ V Ci) 



^j,k 



'nkh.ik{ lo, 



Vloglognfc 



k > l,j > 0. 



Applying Talagrand's inequality with M = DQ^nk/ky^^ and ag = j. < 
2Dihj k, we get for any t>0 and large enough k, 



max 

nk-i<:n<nk 



|\/na„||g^ ^ >yli(i:>3aj,fc + t)| 

< 2[exp(-A2tV(2^i™fc^-,fc)) + exp(-^2tA:'/7(Ani/^))]. 
Set for any p > 1, j > and k>l, 

Pj k{p)=¥\ max ||\/na„|U. > ^1(^3 + p)ai,fc k 

L"fc~i<n<nfe J 

Using the facts that a^^ ^/ {ukhj^k) ^ loglogn^ and hj^k ^ c(lognfc/nfc)^~^/^', 
we readily obtain for large k and j > that 

Pi,fc(p) < 2 exp log log 7ik^ + 2 exp (- ^^^^ V log log log nfc) , 
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which for p= and large k is less than or equal to 4(lognfc)~^. Set 

Ik = max{j : hj^k ^ 2/io} if this set is nonempty, which is obviously the case 
for large enough k. Then we have Ik ^k for large enough k and, consequently, 

4-1 

Pk{p) ■■= E PjAp) < 44(lognfc)-^ < k-^ 

j=0 

provided we have chosen p large enough. 

Further notice that by the definition of l^. for large k, 

which implies that we have for nk~i <n<nk, 

[ak,ho] C [ak,he^^k]- 

Thus for all large enough k, 

Ak{p):={Ak>2Ai{D3 + p)}c [j \ max \\^/^an\\g. > Ai{Ds + p)ajA. 
It follows now with the above choice for p, 

nMp))<Pk{p)<k~\ 

which by the Borel-Cantelli lemma implies Lemma 3. □ 
Write 

(3.23) = v{y)t{F{y) > }. 

For if £ ^, X G / and rik^i <n< rik, let 

(3-24) <U(^) = c.(-)E{^.(>^.)i^(^)}. 

Lemma 4. With probability 1, 

||_(fe) TT7.-(fe) II 

(3.25) lim max sup — = 0. 

fc-*oo nfc_i<n<nfe^(l^g^^/^^)l_2/p<^<^^ V'^^(log(l//i) V log log n) 

Proof. First note that for any h < hQ,ip £ ^ and nk~i <n<nk, 
\m%h\\l < '^sup \\cJinkE[F{Y)l{X G J,F{Y) > (nfc/fc)VP}]. 

We further have by (3.4), 

EFP{Y)1{X G J} < oo, 
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and we see that uniformly in nk~i <n< rik, h<hQ and 92 G 
ilhWl = o{nJ^k'-'h = o(Vn,a,log(l/a,)) 

as k 00, where = c{lognic/nk)^~'^/P . 

By monotonicity of the function h hlog{l/h),h < 1/e, we readily obtain 
that 

fooa\ V sup^g^||ErJ^^^||j 

[6.2b) iim max sup = U. 

fc^oo"fc-i<"<'^feafe<h<ho Vnh{\og{l/h) V log log ?z) 

It remains to be shown that, with probability 1, 

\\—(k) 

[6.27) iim max sup = U. 

k~*ooni:_i<n<nk a^:<h<ho V nh{\og{l / h) VloglogJl) 

Similarly as above we have 

max sup sup||r?JJ^||j 

< At sup \\c4iY,Fi^)HXi G J,F{Yi) > K/A:)^/P}. 

Inspecting the proof of Lemma 1 of [11], we see that the argument there also 
applies if we set hn = c(logn/n)^~^/P to give 



F{Yi)l{Xi e J, F{Y,) > {nk/k)^/P} = o( Vn^a,, log(l/afe) ), 



i=l 



as fc — > 00, and we see by the same argument as in (3.26) that (3.27) holds, 
thereby finishing the proof of Lemma 4. □ 



Proposition 2 now follows from Lemmas 3 and 4. □ 
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